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Abstract 

In order to extend the results obtained with minimal lattice models to more realistic systems, 
we study a model where proteins are described as a chain of 20 kinds of structureless amino 
acids moving in a continuum space and interacting through a contact potential controlled by a 
20 x 20 quenched random matrix. The goal of the present work is to design and characterize amino 
acid sequences folding to the SH3 conformation, a 60-residues recognition domain common to 
many regulatory proteins. We show that a number of sequences can fold, starting from a random 
conformation, to within a distance root mean square deviation (dRMSD) of 2.6^4 from the native 
state. Good folders are those sequences displaying in the native conformation an energy lower than 
a sequence-independent threshold energy. 



I. INTRODUCTION 



A number of models have been studied in the last twenty years to describe the folding 
mechanism of single-domain proteins to their unique, biologically active native conforma- 
tion. All-atom models with semi-empirical potentials^ provide a realistic description of 
proteins but are computationally too demanding to be useful to study their folding (cf. e.g. 
ref.— ). A class of simplified models focuses on an accurate description of the geometry of 
the protein but employs minimal potential functions. It is the case of Go models^, where 
the potential function sums a constant negative term for each native contact in the protein 
conformation. In these models the native conformation is by definition the ground state of 
the system and it is usually possible to perform extensive folding simulations. These models 
are used to study some features of selected proteins, such as the conformation properties 
of the transition stated. On the other hand the contribution of the different parts of the 
protein to its kinetics and thermodynamics is controlled mainly by the entropic term (the 
energetic term being trivial), and the frustration 6 of proteins is underestimated. 

Lattice models use the opposite approach, accounting in a minimal way for the geometry 
of the protein chain, but focusing on the heterogeneity of the interactions^- The protein is 
displayed as a chain of beads sitting on the vertices of a lattice interacting through a non- 
trivial contact matrix. These models account for the frustration of the system, allow the 
study of the amino acid sequences folding to a given model structure (and, consequently, 
of the effect of mutations, of the natural evolution of protein sequences, etc.) and are 
computationally quite economical. These models are used to understand the physical basis 
of the folding process, trying to answer questions such as what makes that a protein displays 
a low-entropy in its equilibrium state ?— , what makes that a protein folds fast ? 10 ! 11 , what 
differentiates a good folder from a random sequence ?— , etc. On the other hand, lattice 
models cannot describe the fold of real proteins with its richness of secondary structures 
and motifs. 

The importance of the heterogeneity in the interaction between amino acids relies on the 
fact that analytical calculations made on random sequences with a replica approach have 
shown 9 that there is a threshold degree of heterogeneity which separates two qualitatively 
different behaviours of the system. For a low degree of heterogeneity any model chain behaves 
essentially as a globular homopolymer, displaying at any temperature an equilibrium state 
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populated of many (i.e., exponentially many with respect to the chain length) different 
conformations. Within this context it is not possible to find protein-like sequences with 
a unique native state. At high heterogeneity, sequences with few dominant conformations 
appear and a fraction of them have a unique equilibrium conformation^. These are the 
candidates to the role of good folders. 

Simulation studies based on lattice models have highlighted (see, among others, ref.— ) 
a simple energetic criterion to distinguish good from bad folders; a sequence will fold to a 
given native conformation if it displays, on that conformation, an energy E^ lower than a 
threshold energy E c , energy which only depends on the statistical moments of the interaction 
matrix and on the length of the chairA The physical meaning of E c is that of being the 
lowest conformational energy a random sequence can have, energy which has a well-defined 
value as a consequence of the frustration of the system 2 ^. This condition (En < E c ) goes 
further than to ensure the thermodynamical unicity of the native state. In fact, it is also at 
the basis of the kinetic ability the protein has to reach the native state on short cal l 1Ql11 . 

An energy minimization of the sequence, keeping the protein conformation fixed, to ener- 
gies below E c is thus a practical algorithm to design good folders. This procedure has been 
applied to the design of lattice model proteins-. A more efficient and thermodynamically 
rigorous method consists in optimizing the conformational free energy of the sequence in 
the protein conformation, thus taking into account also the competing conformations. This 
method has been introduced in refs . 12 ^ 3 and further developed in refsM^^^. 

The purpose of the present work is to build a model which is more realistic of both Go 
and lattice models, including continuous degrees of freedom as well as a non-trivial potential 
function. We show that this model allows sequences to have a unique, stable and kinetically 
accessible native state, and that such sequences obey the same energetic requirement as 
lattice-model sequences do. With the help of this model we will investigate the folding of 
selected sequences into the SH3 domain. 

II. THE PROTEIN MODEL 

The model we investigate describes a protein as an inextensible chain, where amino acids 
are represented by spherical beads centered around the C a -atom, thus allowing a realistic 
accounting for the protein backbone (cf. Fig. Q). Each of the 20 types a of amino acids is 
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characterized by a hard core radius R HC \a). The values of R HC (cr), which range from 2.25A 
to 3.39A, are listed in Table UJ The bond angles are limited within the interval between n/2 
and 0.87T so as to give some amount of stiffness to the polypetide chain. 

The potential energy of the protein depends on the positions {r^} and sequence {<7;} of 
amino acids according to 

U({ri},{<n})= E B^a^iRia^a^-ln-rjl), (1) 
i<j+i 

where 6 is Heaviside's step function, R(<Ti,<Tj) = k ■ (R HC (<Ti) + R HC (<Tj)) is the range of 
interaction which depends on the kind of amino acids proportionally to their hard-core radius 
(k = 0.721) and B U7T is the interaction energy between an amino acid of kind a and one of 
kind 7T. The matrix B a7T is built out of quenched 27 random numbers taken from a Gaussian 
distribution with mean Bq = 0.25 and standard deviation ob = 0.52 (in arbitrary units). 
A slighlty positive mean contact energy has been chosen because it leads to sequences with 
better folding properties than those associated with a contact matrix displaying B < 0. 

The rationale behind the model is that the key ingredient to make a protein fold is the 
heterogeneity existing among the different amino acids. We account for this heterogeneity 
by both varying the size of the amino acid as well as the contact energy acting among 
them. Because no simple energy function capable of folding real protein sequences to their 
native conformation is yet known (see e.g. ref.— ), we shall use in the model calculations 
an interaction potential parameterized by a random matrix. This choice has the advantage 
to make the model results quite general. On the other hand, it has the drawback that the 
labels of amino acids (A, C, etc.) are merely nominal. Since we are interested in the general 
aspects of the physical mechanism of protein folding and not in the detailed chemistry of 
particular sequences, this drawback is of no consequence for the present investigation. 

An important ingredient of the model is a constrain on the total number of contacts each 
residue can build. This constrain reflects (together with the hard core radius) the size of the 
given amino acid. Simulations performed without such a constrain led to a collapse of the 
chain to unrealistically small sizes. The effect cannot be avoided be simply increasing the 
hard core radius, since this value is limited from above by the average distance between two 
amino acids along the chain (^ 3.8 A). This is a limit any model which pictures an amino 
acid as a sphere will display. In keeping with the discussion carried out above, we assign 
to each type of amino acid a quenched random number n max (a) ranging from to 5, which 
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represents the maximum number of possible contacts the amino acid can make (cf. Table 

D). 

In order to design sequences with a specified energy E targ on the SH3 target conformation 
(Fig. I^a)) we perform a sampling of the space of sequences. Starting from a random 
sequence displaying an even concentration of amino acids (operatively, we have mantained 
that of the wild-type Src SH3, although this choice has not any deep meaning, due to 
the merely nominal character of the amino acid letters), we keep the conformation fixed 
and perform swap moves among the amino acids, so that their relative frequency remains 
constant. Consequently, the average and standard deviation of the contact energy matrix 
weighted by this frequency remain equal to that associated with the (unweighted) 20 x 20 
matrix. Examples of sequences obtained with this algorithm are listed in Table |HI We note 
that, although wild-type sequences do display some amount of amino acid repetitions, the 
designed sequences display unrealistic repeats of amino acids of the same kind (see Table 
ITlj) . This is an artifact of the simplified spherical geometry of model residues, which causes 
correlations among consecutive sites. In fact, whenever the j-th residue (j being the site 
index along the chain) interacts with residue j*, then residues j — 1 and j + 1 are likely to 
lie within the interaction range of j* as well. This effect is smaller in real proteins because 
of the complicated geometry of sidechains. Since this artifact of the model does not put 
obstacles in the designing of good folders, we will postpone the solution of this problem to 
a future work, where the folding sequences will be analyzed from a bioinformatic point of 
view. 

III. CONFORMATIONAL ANALYSIS OF THE LOWEST ENERGY SEQUENCE 

The first sequence analyzed is that displaying the lowest energy E targ = —37.80 on the 
SH3 conformation, and labeled si in Table ITT1 We have performed Monte Carlo simulations 
(each of 10 9 steps) in conformational space at fixed temperatures, ranging from the value 
0.05 to 1.0, starting each time from a random conformation. In 10 simulations at T = 0.12 
the sequence s\ finds each time an energy minimum displaying an dRMSD 28 to the SH3 
target conformation (Fig. EJa)) smaller than 3.3A and with the relative number q of native 
attractive contacts larger than 0.80. The minimum energy, that is the ground state energy 
found in the 10 runs is E gs = —46.96 and is associated with a conformation displaying a 
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dRMSD of 2.6 A and a q of 0.85 (Fig. mb)). No conformation dissimilar from the native 
conformation and displaying a energy lower than —46.96 is found in Monte Carlo simulations 
at any temperature. 

The ground state energy is smaller than the energy E targ = —37.80 found in the sequence 
minimization. This is because the ground state conformation displays N c = 74 contacts, 
while this number is 67 for the target conformation and each of the 7 new contacts has in 
average an energy of —1.3. Very-low temperature simulations (T = 0.01) starting from the 
target conformation converge rapidly into the ground state conformation, thus indicating 
that the two conformations belong to the same basin of attraction. The reason why the 
target configuration is not exactly the ground state conformation of the system is that, 
during sequence optimization, the native conformation is kept strictly fixed. If one were 
interested in making the target and the ground state conformations coincide exactly, one 
should allow some degree of conformational relaxation during sequence minimization. That 
is, perform a minimization in the crossproduct space of sequences and configurations. 

In Fig. [2] we display the dRMSD and the fraction 29 of native (attractive) contacts q 
associated with sequence si as functions of the number of Monte Carlo steps. As in the case 
of Go-^ 2 - and of the lattice-model designed sequences^, the protein wanders between unfolded 
states (dRMSD pa 6 A) until it suddenly finds the native energy basin (dRMSD< 3 A). 
Because the conformational move implemented in the Monte Carlo simulation is a small flip 
of a random-selected amino acid, one could also view such a simulation as an approximation 
to the dynamical trajectory 2 ^ Note also that, unlike Go models^, in the potential function 
used in these simulations there is no information concerning the native conformation of the 
protein. 

During the simulations one observes a number of transitions from the folded to the 
unfolded conformation and vice versa (Fig. |2J). These transitions are reflected by changes 
in energy, the mean energy difference between the two states being about 2 energy units 
(i.e., ~ 20 kT). On the other hand, it is difficult to distinguish between folded and unfolded 
states from the fraction of native attractive contacts q alone (lower panel in Fig. EJ). This 
fact can be regarded as an indication that not all contacts partecipate on equal footing to 
the stability of the native conformation. 

The thermodynamics of sequence s\ has been studied by means of a multicanonical sam- 
pling algorithm 2 ^. The probability distribution at T = 0.12 as a function of energy and 
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dRMSD is shown in Fig. El The plot shows a clear two-state behaviour. The centroids of 
the two peaks are characterized by the values E = —42, dRMSD=3.1 (native state) and 
E = —41, dRMSD=6.3 (unfolded state), respectively. The fact that at this temperature the 
volumes defined by the two peaks are equal qualifies T = 0.12 as the folding temperature. 
Note that the energetic difference between the two peaks is only pa 8 kT, while the energy 
fluctuations amount to ~ 25 kT, and thus the energy distributions overlap consistently. 
Consequently, it is difficult to identify the two states only from the energy distribution of 
sequence in conformational space. Anyway, the plot shows that the lowest energy that the 
unfolded state can reach is —42.5. 

The specific heat associated with sequence si is displayed in Fig. HJand shows two major 
peaks at T = 0.12 and T = 0.34. These peaks indicate cooperative kinds of transitions and 
will be further investigated in a successive work. 

In Fig. El the mean dRMSD of the si and of the s 9 (random sequence) are displayed as 
a function of the conformational energy. For energies larger than —41 the mean dRMSD 
of the designed sequence si is very similar to that of non-designed sequence Sg, displaying 
a wide plateau at dRMSD 5. 7 A, corresponding to unfolded conformations. The designed 
sequence s±, on the other hand, displays a transition around —41.5 to dRMSD 2.8A. This 
allows us to define the native basin as the set of conformations displaying a dRMSD lower 
than 4.2A, that is the midpoint of the transition, also consistently with the fluctuations 
observed in Fig. |2] and with the transition state of Fig. 01 

IV. FOLDING PROPERTIES OF DIFFERENT SEQUENCES 

The conformational analysis has been repeated for other 9 sequences displaying various 
values of E targ (cf. Table |TTJ). For every sequence the average dRMSD obtained in 10 
independent Monte Carlo simulations as a function of E ta rg is reported in Fig. El One can 
observe a monotonic behaviour up to E targ pa —35, where the dRMSD assumes a value of 
4.2A, which we have defined in the previous Section as the threshold between native and 
unfolded state. Moreover, the dRMSD associated with the ground state conformation is also 
listed in Table |TI] Sequences displaying E tar g above —35 are neither able to find during the 
simulations conformations similar to the target one, nor populate a set of structurally similar 
conformations (cf. inset of Fig. |HJ). This result allows us to obtain, within the framework of 
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the model introduced in Sect HP energetic criterion to design protein-like sequences. In fact, 
good folders onto the SH3 domain are those sequences displaying an energy E targ < Ef arg , 
where E c targ = -35. 

While the dependence of the folding properties of a sequence on E targ are quite clear, the 
dependence of these properties on E gs are less well-defined (cf. Table UT|) . In fact, while 
sequences with E gs well below —44 are guaranteed to fold (like the case of si, S2 and S3) it 
is difficult to assess the behaviour of sequences displaying values of E gs around —44 (see, 
e.g., S5 and Sq). The problem arises mainly because the minimum energy conformations 
associated with these sequences display different numbers of contacts (iV c varies between 67 
and 74), and consequently the system can gain or loose an amount of energy of the order 
of some kT with ease. Anyway, for the purpose of designing a good folder into a given 
three-dimensional conformation what matters is the E tar g criterion. 

In Fig. 13 the conformational entropy of three protein-like sequences, namely si, S2 and 
S3 are shown as a function of energy. For reference, the entropy of the non-folding sequences 
s$, S9 and S10 are also plotted. The entropy of high-energy states (E > —20) is very similar 
for all sequences except the purely random ones (sg and sio). This is consistent with the 
idea that in high-energy conformations contact energies can be regarded as random, any 
specific effect of the individual sequence being lost. In fact, the high-energy part of the plot 
can be well approximated by means of the random energy model 20 , where the total energy 
E of the system is described as the sum of N c uncorrelated stochastic contact energies, N c 
being the typical number of contacts in a globular conformation. The resulting entropy is 

where So sets the zero of the entropy, B = 0.25 and Ob = 0.52 are the mean and standard 
deviation of the interaction matrix. The curve described by Eq. (J2J) is plotted with dotts in 
Fig. fitting the values of S and of iV c (= 29) to the high temperatures part of the curves 
obtained from the simulations. Below energy E ~ —20 the entropy of these sequences is 
influenced by the specificity of the sequence, as evinced by the departing of the curves from 
Eq. ©. 

The random sequences S9 and sio, on the other hand, display as expected an entropy 
function which is qualitatively different from the folding sequences. Furthermore, this cannot 
be fitted satisfactorly with Eq. (J2J). This is somewhat unexpected, since a random sequence 
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should be described better than a designed sequence by the random energy model. 

Moreover, the non-designed sequences not only display low-energy conformations struc- 
turally dissimilar from the target conformation, but these conformations are also dissimilar 
among themselves. The inset of Fig. |B]shows the distribution of dRMSD for a good and a bad 
folder, calculated pairwise in a sample of 20 conformations displaying E gs < E < E gs +10 kT. 
The centroid of the distribution associated with sequence sg lies at 10 A (dashed curve), in- 
dicating that the associated conformations are structurally very different. 

As expected, the ground state energy of random sequences is higher than that of good 
folders (e.g. E gs = —42.36 and E gs = —39.36 for sg and sio, respectively). Consistently 
with the results of lattice models^, the mean ground state energy of random sequences is 
approximately equal to the lowest energy of the unfolded state of a good folder (« —42, see 
Fig. I2J), and we shall call E c this energy. 

Consistently with these findings, a sufficient (but not necessary; cf. e.g. sequence s 4 ) 
condition for any sequence to be a good folder is that it displays a ground state energy E gs 
much lower than E c (cf. Table ITTj) . The reason is that, since E c is essentially sequence- 
independent, if a sequence displays E gs <C E c , then its conformational ground state has 
not to compete with the sea of unfolded conformations. Differently from the case of lattice 
models, this result is only partially predicitive. While for lattice-model sequences the folding 
requirement is just E gs < E,J&, in the present model it is important, although not well- 
defined, the "much lower" requirement. The reason for this difference is that in the present 
model there are consistent fluctuations in the number of contacts, which produce fluctuations 
in the energy. In a lattice model, due to the discreteness of the degrees of freedom, this effect 
is much smaller, and the overall behaviour is consequently clearer. On the other hand, one 
can easily distinguish good from bad folding sequences on the basis of the target energy 
Etarg which, being calculated on a fixed conformation, does not undergo such fluctuations. 

There are other features which, although not being usable for the design, set a physically 
clear difference between folding and non-folding sequences. First, the density of states of 
designed sequences at low and intermediate energy is much higher than for random ones (see 
Fig. EJ). That is, it is higher if the folding sequence is better designed. This is equivalent 
to state that the conformational accessibility of the ground state of well designed sequences 
is greater than for random or bad designed ones. In other words, asking for a deep funnel 
to be carved in the energy landscape ensures it to be also a wide one, consistently with 
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the findings of ref.— . The second discriminant between 'good' and 'bad' folders is clearly 
seen in the fraction of native attractive contacts vs. energy curve (Fig. |7j). The linearity 
of such curve for well designed sequences is, on the one hand, a striking confirmation that 
simple topology-based models (Go- model), which assume the energy gain to be proportional 
to the fraction of native attractive contacts, do indeed capture the basic feature of the 
energy landscape for a 'good' folding sequence, i.e. the existence of a funnel towards the 
native state. On the other hand, it shows that our simple model is able to reproduce such 
crucial feature without any 'a priori' knowledge of the native state. Random or 'bad' folding 
sequences instead fail in creating the proper folding funnel. Note that both features can be 
easily appreciated only in the microcanical ensemble by looking at the behaviour of entropy 
(fraction of native attractive contacts) as a function of energy. 

The energy distribution per site of s\ in the target conformation is also typical of good 
folders, as found in the case of lattice models^, the energy being concentrated mainly in 
few "hot" sites (cf. inset to Fig. On the contrary, the stabilization energy of a random 
sequence is, as expected, evenly distributed over the ground state of the protein. 



V. CONCLUSIONS 



In the case of simple lattice models, the thermodynamics of heteropolymers is reasonably 
well-understood, and there is an efficient algorithm to design folding sequences given a target 
conformation and an interaction matrix. We have studied a model with continuous degrees 
of freedom, showing that it is possible to design 20-letters sequences which fold stably and 
fast to a given conformation, without that the potential function contains any information 
about the target conformation. A key ingredient of the model is a constrain on the total 
number of contacts that each amino acid can build, which reflects geometric features of the 
amino acid neglected by a spherical-bead model. By means of such a model, it is possible 
to conclude that good folder sequences are those displaying on the target conformation an 
energy lower than a sequence-independent threshold. 
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in the given conformation and in the native state, calculated over all pairs of residues. As a 
reference, note that the dRMSD of a random conformation displays a dRMSD of the order of 
25A to the native conformation of SH3. On the other hand, the meaning of a dRMSD of 2.6A 
can be appreciated from Fig. 1. 

Only attractive contacts are counted and the same definition of contact as in the potential 
function (Eq. (1)) is used to calculate q. 
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TABLE I: The features of the amino acids. 



label 


Etarg 


Egs 


dRMSDfl] 




si 


-37.80 


-46.96 


2.6 


GLLLLAANNWWVTRTDEEKKDYVSSSSDDTQTGGYNIEGLIFFRQVVPPEAHTYYSSSTT 


S2 


-37.27 


-46.22 


3.0 


GLLLLEEEGWWNGTTVYYKFDESPDSSSDTNGVTNYYVLFITRRVQQAAADHTPISSSKT 


S3 


-36.20 


-45.58 


3.2 


NKSAAAHQPERFTTVSSSEEPIYEVLLNWTYTTTRDSDSDDKFGWGGLLLQGTIYVNSVY 


S 4 


-35.53 


-44.45 


3.4 


QQHAASSSDDSDVFTVPPLGNLTNYYGIITKTTWLLFEGGAYTRNVDEEESSTLSVKYRW 


•S5 


-34.85 


-44.55 


4.0 


GDSAAAHQPERWWTTSSSEEPIYEVLLNVTTTFTRDVDSSDKVGFNGLLLQGTIYYNSKY 


S6 


-34.30 


-44.31 


4.7 


QWAAHEEEDYRNFGTSSSYQGPGINSSFKTGYTTVDSDSLATRWVDLLLILWEPKNYTT 


S7 


-33.65 


-44.22 


4.8 


SGLNLEEPGKKYFRRTAAWFVEGSDSSVGTTTTNQHQTALLLWVSDDYYYIIVEPDSSTN 


s$ 


-23.67 


-42.09 


4.8 


DSSSSEERDIFYTTTWYYQQGPLNSLLLGTVKTVDDIYSSAKTRWVGAAHGPTEEFNLVN 


sg 


-4.51 


-42.36 


5.5 


NLILYEKLDNRFNKWWFLADSSPASGQVDRTTSTVSSTQEHTTYEEYVSGLGTIPDAVGY 




+8.26 


-39.36 


5.9 


EYLSVIKTEDPKQSEYPSWLSEFFLLTIATGNTLYYDGVHAVTSSRNSGGDAVRNDTTWQ 



TABLE II: Sequences with selected energies E tar g on the SH3 target conformation displayed in 
Fig. ^ The reported dRMSD is that of the ground-state conformation. Sequences Sg and sio are 
purely random. 
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FIG. 1: (a) The native structure in a C a representation of SRC SH3 as obtained by crystallo- 
graphic experiments (pdb code 1FMK) and (b) the minimum energy structure of the sequence s±, 
corresponding to a dRMSD of 2.6A. 

FIG. 2: The dRMSD (above), the energy (middle) and the fraction of native attractive contacts q 
(below) as a function of the number of steps for a simulation of sequence si starting from a random 
conformation at T = 0.120. 

FIG. 3: Probability distribution as function of energy and dRMSD for sequence s± at temperature 
0.120. 

FIG. 4: The specific heat C v as a function of temperature for the sequence s±. 

FIG. 5: The average dRMSD as a function of energy, calculated for sequence s± (above) and s^ 
(below). The error bars indicate the dRMSD standard deviation. 

FIG. 6: The conformational entropy as a function of energy for some of the sequences listed in 
Table UTI Solid curves indicate folding sequences, dashed curves non-folding sequences. The dotted 
curves are the prediction of the random energy model. In the inset, the distribution of dRMSD for 
low-energy conformation sampled with sequence si (solid curve) and ss (dashed curve). 

FIG. 7: Fraction of native attractive contacts q as function of energy for sequences s± (straight 
line), sg (dashed) and sg (dotted), representing respectively a good folder, a bad folder and a 
random sequence. In the inset, the distribution of stabilization energy among the residues in the 
target conformation of s±. 
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